LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Eunsu Kim; Juyoung Suk; Seungone Kim; Niklas Muennighoff; Dongkwan Kim; Alice Oh

doi:10.18653/v1/2025.findings-acl.1357

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh

Abstract

We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides feedback on responses and poses follow-up questions to the evaluated LLM. At the start of the interview, the LLM interviewer dynamically modifies datasets to generate initial questions, mitigating data contamination. We apply the LLM-as-an-Interviewer framework to evaluate six models on the reasoning, factuality and instruction-following tasks. Our results show that the framework effectively provides insights into LLM performance, including the quality of initial responses, adaptability to feedback, and ability to address follow-up queries like clarification or additional knowledge requests. The framework also addresses key limitations of conventional methods like LLM-as-a-Judge, including verbosity bias and inconsistency across runs. Finally, we propose the Interview Report, which aggregates insights from the interview process, providing examples and a comprehensive analysis of the LLM’s strengths and weaknesses. This report offers a detailed snapshot of the model’s real-world applicability.

Anthology ID:: 2025.findings-acl.1357
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26456–26493
Language:
URL:: https://aclanthology.org/2025.findings-acl.1357/
DOI:: 10.18653/v1/2025.findings-acl.1357
Bibkey:
Cite (ACL):: Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, and Alice Oh. 2025. LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26456–26493, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation (Kim et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1357.pdf

PDF Cite Search Fix data